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Abstract 

How can we model influence between individuals in a social system, even when the network 
of interactions is unknown? In this article, we review the literature on the "influence model," 
which utilizes independent time series to estimate how much the state of one actor affects the 
state of another actor in the system. We extend this model to incorporate dynamical parameters 
that allow us to infer how influence changes over time, and we provide three examples of 
how this model can be applied to simulated and real data. The results show that the model 
can recover known estimates of influence, it generates results that are consistent with other 
measures of social networks, and it allows us to uncover important shifts in the way states may 
be transmitted between actors at different points in time. 

1 Introduction 

The concept of influence is extraordinarily important in the natural sciences. The basic idea of 

influence is that an outcome in one entity can cause an outcome in another entity. Flip over the first 
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and should not be interpreted as representing the official policies, either expressed or implied, of AFOSR, ARL or the 
U.S. Government. 
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domino, and the second domino will fall. If we understand exactly how two dominoes interact — 
how one domino influences another — and we know the initial state of the dominoes and how they 
are situated relative to one another, then we can predict the outcome of the whole system. 

For decades, social scientists have also been interested in analyzing and understanding who in- 
fluences whom in social systems HI 13- But the analogue with the physical world is not exact. In 
the social world, influence can be more complicated because internal states are often unobservable, 
intentional behavior can be strategic, and the situational context of specific interactions can change 
the effect one actor has on another. And even more challenging, actors can choose with whom they 
interact, which can confound efforts to infer influence from correlated behaviors between actors 0. 
As a consequence, there has been tremendous interest in developing methods for better understand- 
ing the effect that networked interactions have on the spread of social behaviors and outcomes. 

Social scientists have already carefully studied communication settings like group discussions to 
better understand the causal mechanisms that underlie influence [4], but recent advances in modern 
sensing systems such as sociometric badges [5] and cell phones now provide valuable social 
behavioral signals from each individual at very high resolution in time and space. The challenge for 
those of us interested in signal processing is how to use this data to make better inferences about 
influence within social systems. 

In this article we describe the "influence model" first articulated in [7] and the subsequent lit- 
erature that has refined this approach. Similar definitions on influence in other literature include 
research on voting models in physics (H, cascade models in epidemiology [1], attitude influence 
in psychology [9] and information exchange models in economics iflOl . The influence model is 
built on an explicit abstract definition of influence: an entity's state is influenced by its network 
neighbors' states and changes accordingly. Each entity in the network has a specifically defined 
strength of influence over every other entity in the network, and, equivalently, each relationship can 
be weighted according to this strength. 

We believe that the influence model is a unique tool for social scientists because it can be applied 
to a wide range of social systems (including those where aggregates like organizations, states, and 
institutions can themselves be thought of as "actors" in a network). The influence model also enables 
researchers to infer interactions and dynamics when the network structure is unknown — all that is 
needed is information about time series signals from individual observations. And although this 
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method is subject to the same limitations as any observational network study ifTTTl. the ordering of 
behaviors in time and social space makes it less likely that alternative mechanisms like selection 
effects and contextual heterogeneity can explain the patterns of influence ascertained by the model. 

The rest of this article is organized in the following way. We first describe the influence model 
in Section [2] and previous works in Section [3] In Section [4] we introduce the dynamical influence 
model, a generalization of the influence model for changing network topology. We then discuss the 
inference algorithms in Section [5] In Section [6j we give specific examples of its ability to recover 
plausible and known influence pathways between entities in a network with real and artificial data. 

2 Overview for the Influence Model 

2.1 Entities in a Social System 

We describe the influence model here, followed by a review on its history in Section [3] The model 
starts with a system of C entities. We assume that each entity c is associated with a finite set 
of possible states 1, . . . , S. At different time t, each entity c is in one of the states, denoted by 
lif 1 6 {1, . . . , S}. It is not necessary that each entity is associated with the same set of possible 
states. Some entities can have more or less states. However, to simplify our description, we assume 
that each entity's latent state space is the same without loss of generality. 

The state of each entity is not directly observable. However, as in the Hidden Markov Model 

(c) (c) 

(HMM), each entity emits a signal O t at time stamp t based on the current latent state h t , fol- 
lowing a conditional emission probability Prob(O t (c) | h { t c) ). The emission probability can either be 
multinomial or Gaussian for discrete and continuous cases respectively, exactly as in HMM litera- 
ture Ifl2l . 

It is important to note here that entities can be anything that has at least one state. For example, 
they could be people in group discussions who are in a "talking" state or a "silent" state. Or they 
could be geographical districts with variation in flu incidence that yields some in a "high incidence" 
state or a "low incidence" state. The fundamental question remains in any situation, does the state 
in one entity influence (cause a change) the state in another entity? It is therefore possible to apply 
the influence model to a wide range of contexts. 
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2.2 Influence between Entities 

The influence model is composed of entities interacting and influencing each other. "Influence" 

(c) 

is defined as the conditional dependence between each entity's current state h\ at time t and the 
previous states of all entities h^} l3 . . . , h[_\ at time t — 1. Therefore, intuitively, is influenced 
by all other entities. 

An important implication of this Markovian assumption is that all effects from states at times 
earlier than t — 1 are completely accounted for by incorporating all information from time t — 1. 
This does not mean that earlier times had no effect or were unimportant - it just means that their 
total effect is felt in the immediately-previous time period. And even path dependent processes (of 
which there are many in the social sciences) can operate this way, one time period at a time. 

We now discuss the conditional probability: 

Prob^f^W,...,^})- (1) 

Once we have Prob(/i| c ^/i^ij ■ ■ • j ^i-i)> we naturally achieve a generative stochastic process. 
As in the coupled Markov Model lfT3l . we can take a general combinatorial approach Eq. [T] and 
convert this model to an equivalent Hidden Markov Model (HMM), in which each different latent 
state combination of (h[_i, ■ ■ . , h^\) is represented by a unique state. Therefore, for a system with 
C interacting entities, the equivalent HMM will have a latent state space of size S c , exponential to 
the number of entities in the system, which generates insurmountable computational challenges in 
real applications. 

The influence model, on the other hand, uses a much simpler mixture approach with far fewer 
parameters. Entities 1, . . . , C influence the state of d in the following way: 

Prob(hl c ' ) \hl 1 ] 1 ,...,hf ) 1 )= Rc '> c x lnfltf ' Vi-i); (2) 

cell c\ ^ v ' 

1 ' tie strength influence c -> c' 

where R is a C x C matrix. (R Cl ,c 2 represents the element at the ci-th row and the C2-th column of 
the matrix R) R is row stochastic, i.e., each row of this matrix sums up to one. Infl(/ij C ^/j^-i) i s 
modeled using a S x S row stochastic matrix M c,c ', so that Infl(/ij C = M c ' ( 6 c) (c , } , where 
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M c ' ( c c) represents the element at the /i^-th row and h[ c ^-th column of matrix M c,c '. The row 

" , t-i>' l t 

stochastic matrix M c,c captures the influence from c over d, and is very similar to the transition 
matrix in the HMM literature lfl"2ll . 

Eq. [2]can be viewed as follows: all entities' states at time t — 1 will influence the state of entity 
d at time t. However, the strength of influence is different for different entities: the strength of c 
over d is captured by R c ' jC . As a result, the state distribution for entity d at time t is a combination 
of influence from all other entities weighted by their strength over d. Such definition of influence 
from neighbor nodes is not unique, and it has been well studied in statistical physics and psychology 
as well® HO- Because R captures influence strength between any two entities, we refer to R as the 
Influence Matrix. 

Generally, for each entity c, there are C different transition matrices in the influence model to 
account for the influence dynamics between c and d ,d = 1, . . . , C. However, it can be simplified 
by replacing the C different matrices with only two S x S matrices E c and F c : E c = M c ' c , 
which captures the self-state transition; Empirically, in many systems an entity c may influence 
other entities in the same manner. For instance, a strong politician always asks everyone to support 
his political view no matter who they are. Therefore, sometimes we can simplify the system by 
assuming M c,c ' = F c , Md / c. 

2.3 Inference 

The influence model is a generative model defined by parameters R, E 1:C , F 1:C and the emission 
probabilities Prob(0^\h[ c ^), Vc. As in most generative machine learning models, these parameters 
are not set by users, but they are automatically learned from observations 0\. T , . . . , 0^. T . The 
inference algorithms for learning these parameters will be discussed in Section [4] 

The influence model has two key advantages over other machine learning approaches. First, the 
number of parameters grows quadratically with the latent space size S and linearly to the number 
of entities C. As a result, the influence model is resistant to overfitting when training data is limited 
compared with other approaches lPT4ll . 

Second, the model captures the tie strength between entities using a C x C matrix R. R 
inferred by our model can be naturally treated as the adjacency matrix for a directed weighted 
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graph between nodes. This key contribution connects the conditional probabilistic dependence to a 
weighted network topology. In fact, the most common usage for the influence model in the literature 
is to use R to infer social structure lil~5l [T6ll . 

3 Previous Applications of the Influence Model 

The influence model has been applied to various social systems, particularly those that have been 
monitored by sociometric badges like those shown in Fig. [T] These badges are personal devices 
that collect individual behavioral data including audio, location, and movement. Early attempts 
to analyze data from these badges focused on questions revolving around group interaction and 
interpersonal influence. 




Figure 1 : Different versions of the sociometric badge is shown in the left and in the middle. The 
sociometric badge is a wearable sensing device for collecting individual behavioral data. On the 
right is a group brainstorming session, and all participants were wearing the sociometric badges. 

The first application of the influence model [7 ] attempted to infer influence networks from audio 
recordings of a group discussion session with five individuals. The reseachers used audio features 

(c) 

as observations O t and modeled the latent state space to be either "speaking" or "non-speaking". 
They then used the model to infer the underlying pattern of interpersonal influence from the noisy 
signals measured directly from each individual and their interactions on turn taking. 

An important question about these inferences relates to their validity: how do we know that the 
measure of influence is real? Another set of researchers applied the influence model to conversation 
data from sociometric badges on 23 individuals and showed that the influence strength between 
individuals learned by the model correlated extremely well with individual centrality in their social 
networks(with R = 0.92, p < 0.0004) |[l"6l . This evidence suggests that the influence matrix 
defined as the weights in the conditional dependence on states of other entities is an important 
measure for the social position of the individuals in real interaction data. In other words, even 
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more abstract concepts related to influence like status or social hierarchy might be captured by the 
inferences of the influence model. 

The model has also been applied to many other human interaction contexts lfT31 . For instance, 
researchers have used the influence model to understand the functional role (follower, orienteer, 
giver, seeker, etc) of each individual in the mission survival group discussion dataset lfT7l . They 
found that the inferred influence matrix helped them to achieve better classification accuracy com- 
pared with other approaches. The model has also been applied to the Reality Mining[6] cell- 
phone sensor data. Using information from 80 MIT affiliates as observations and constraining 
the latent space of each individual to be binary "work" and "home", researchers found that the 
influence matrix learned from this data matches well with the organizational relationship between 
individuals!!!!. 

Recently the influence model has been extended to a variety of systems, including traffic pat- 
terns [18] and flu outbreaks |[T9ll . But more importantly, there have been methodological advances 
that allow the model to incorporate dynamic changes in the influence matrix itself lTT9ll . This new ap- 
proach, the Dynamical Influence Model, is a generalization of the inference model, and is discussed 
in the following section. 

Related approaches have utilized Bayesian networks to understand and process social interaction 
time series data. Examples include coupled HMM lfl3l . dynamic system trees and interacting 
Markov chains EDI . The key difference between these approaches and the influence model is that 
the influence matrix R connects the real network to state dependence. 

The key idea of the influence model is to define influence as the state dependence for an entity 
on the weighted sum of states from network neighbors. This idea has been extensively explored by 
statisical physicists! 8], and very recently by psychologists in modeling attitude influence 0. 

4 The Dynamical Influence Model 

Above, we introduced the influence model, where the influence strength matrix R remains the 
same for all t. However, there is extensive evidence leading us to think that influence is indeed a 
dynamical process] 21 1. This can also be seen from many real- world experiences: Friendship is not 
static; In negotiations, your most active opponent may change due to shifts in topics or strategies 
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over time. Therefore, we believe that the influence between subjects may fluctuate as well in many 
social system. 

Here, we demonstrate how the influence model can be extended to the dynamical case, and 
we call this generalization the Dynamical Influence Model. Instead of having one single influence 
strength matrix, R, we consider a finite set of different influence strength matrices, {R(l) , . . . , R( J)}, 
each representing a different pattern between entities. J is a hyperparameter set by users to define 
the number of different interaction patterns. Our approach is basically a switching model, and we 
also introduce the switching latent state rt G { 1, . . . , J} , t = 1 , . . . , T, which indicates the current 
active influence matrix at time t. Therefore, Eq. [2] turns into the following: 

Vmb(h'f' ) \h { X---MZ\)= E R(n) c ', c xlnfl(/ l i c ' ) |/ l l- ) i)- O) 

ce{i,...,C} 

As rt switches to different values between 1 to J at different times t, the dynamics are then deter- 
mined by different influence matrices R(r t ). 



As shown in Section 6. 1 we note that it is very important to constrain the switching of r t for two 
reasons: a) In many social systems, the change of influence patterns changes slowly and gradually, 
b) A prior eliminates the probability of overfitting. Therefore we introduce the following prior for 

rt- 

r t+1 \r t ~ multi(y rti i, • • • , V rt}J ), (4) 

where V is a system parameter matrix constrained by another hyperparameter p v ,p v >= 0. The 
prior is shown in Eq. [5] 

(V rul , . . . ,V ruJ ) ~Dirichlet(10°,10°,...,10P ,/ ,...,10 ). (5) 

t t t t 

i, 2, n, j 

This prior provides a better control of the process n, . . . , tt- When p v = 0, the Dirichlet prior 
turns to a uniform distribution. However, the higher p v gets, the more likely rt-i and rt will be the 
same. 
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Given the model description and hyperparameters J and p v , we can then write the likelihood 
function: 



C(0]§,h]:g,r 1£ r\E lilJ ,F ltu ,IL(l : J),V) 

T C 

[j{Prob(r f |r t _i) x JJ [prob(O t (c) |/i i (c) ) x Prob(/if ) |/iJ-i" C) > r t ) } 



il:C tt.1:C* 



(6) 



t=2 



c=l 



C 

f[ Prob(o} c) | )Prob {h[ c) )Prob (n ) . 



(7) 



c=l 



To demonstrate the difference between the static influence model and the dynamical influence 
model, we illustrate the Bayesian graph for both models in Fig. |2| 



latent trace for 
network structure 



Entity 1 




Entity 2 



Dynamical Influence Model 



Figure 2: A graphical representation of our model when C = 2. The blue lines show the dependence 
of the influence model described in Section[2] The red lines indicate the layer that brings additional 
switching capacity to the influence model, and together with the blue lines they fully describe the 
variable dependence of the dynamical influence model. 



Researchers have been studying a variety of alternative time-varying network models: from 
EGRM E21 to TESLA [23]. EGRM computes a set of features from networks and how they change, 
and models the distribution of network evolution as the distribution of feature evolution. TESLA 
uses changing network edges to capture correlations in node observations with l\ constraints on 
edges to ensure sparsity. Another recent version also learns network topology from cascades Il24l . 
Compared with these models, the dynamical influence model serves as a unique generative approach 
for modeling noisy signals from a dynamical network. 
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5 Inference 

In signal processing applications, we are given the observation time series sig nals 0^,...,0g, 
and based on these observations we need to learn the distributions of underlying latent variables 
and system parameters for the dynamical influence model. The inference process for our model is 
discussed here. Since the dynamical influence model is a generalization of the influence model, the 
following description is applicable to both models. 

Previously, researchers started with a standard exact inference algorithm (Junction Tree) with 
exponential complexity, and then moved to an approach based on optimization lfTol . Other scholars 
gradually moved to an approximation approach based on the Forward-Backward algorithm and 
variational -EM ll2"5l[T9l . The influence model can also be trained via other approximations like the 
mean field method] 26]. 

Here we show some key steps for the variational E-M approach, which has been developed and 
applied successfully in many datasets. We refer readers to Pan et al lfT9l for detail. We denote 
definition by =, and same distribution by ~, but the right hand side of all equations should be 
normalized accordingly. 

E-Step: We adopt a procedure similar to the forward-backward procedure in the HMM litera- 
ture. First, we define the following forward parameters for t = 1, T.\ 

a% = Prob(^ c) |n, O l:t ), Kt = Prob(r t |0 1:t ), (8) 

where 0\± denotes {0^} C t~2i ' C f However, complexity for computing given d^Z\ c grows ex- 
ponentially with respect to C, so we adopt the variational approach[27 ], and E-M is still guaranteed 
to converge under variational approximation [27]. We proceed to decouple the chains by: 

Prob(/4 1) ,...A (C °|Oi:t,r t ) &l[Q(hV\0 1:t ,r t ), (9) 

c 

and naturally: 

a r t ^^Q(4 c) \0 1:t ,rt) (10) 
The approximation adopted here enables us to run inference in polynomial time. Based on this 
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approximation, starting with a 1 1 c and k\, we can compute a t * c and K t , Vi = 2, T step by step in 
a forward manner. 

Using the same idea, we can compute the following backward parameters for all t in the back- 
ward order (i.e. start with t = T, then compute /3[* c and v t for t = T — 1, T — 2, 1): 

/3£ = Prober*, O t:T ), ^ = Prob(r t |O t:T ). (11) 

M-step: With Kt and ut, we can estimate: 

Zlj = Prob(r t = i, r t+ i = j|Oi :T ) = 

Prob(n = i|Oi :t )Prob(r t+ i = j|O t+ i : r)Prob(n+i|n)/ 
^Prob(r t = z|0 1:t )Prob(r t+ i = j|O t+1:T )Prob(r t+ i|r t ), (12) 

\\ = Prob(r t = i\0 1:T ) = ^=^f t , (13) 



and update V by: 



<- ^ % , „v , d4) 



where A; = p v if z = j, otherwise. 

We then compute the joint dist 
such as influence matrices R(l), R(J), E c and F c by marginalizing this joint distribution 



We then compute the joint distribution Prob(h t t+1 , hfh, rt+i\Oi : T), and update parameters 



6 Applications 

6.1 Toy Example: Two Interacting Agents 

In this example, we demonstrate how the dynamical influence model can be applied to find struc- 
tural changes in network dynamics. As a tutorial, we also explain how readers should adjust two 
hyperparameters J and p v in using this model. 

From a dynamical influence process composed of two interacting entities, we sample two bi- 
nary time series of 600 steps. Each chain has two hidden states with a random transition biased to 
remain in the current state. We sample binary observations from a randomly-generated multinomial 
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distribution. To simulate a switch in influence dynamics, we sample with influence matrix R(l) 
(shown in Table [I]) in the first 200 frames, and later on we sample with influence matrix R(2). We 
purposely make the two configuration matrices different from each other. Partial data are shown in 
Table [T] (left). We use the algorithm in Section [5] to infer the dynamical influence model's param- 
eters V, R(l : J), E 1:C , F 1:C . All parameters (including the emission distribution) are initialized 
randomly, and they are estimated automatically during the E-M process. 

Table 1: Left: Part of the two input toy sequences for a two-chain dynamical influence process. 
Right: The original two influence matrices of the toy model and the same matrices learned by our 
algorithm with J = 3 and p v = 10 1 . 



R(l) R(2) 



SEQ. NO. DATA(PARTIALLY) 

/ 0.90 0.10 \ / 0.05 0.95 

1 221111121212212... me \ 0.10 0.90 J \ 0.95 0.05 

2 112111212121122... , ( 0.93 0.07 \ ( 0.08 0.92 

Learned 



0.10 0.89 J \ 0.94 0.06 

Choosing hyperparameters: We now discuss the selection of hyperparameters J and p v . For 
the number of active influence matrices J, we illustrate their characteristics by running the same 



example with J = 3. We show the poster distribution of r t (calculated in Eq. 32 ) in Fig. 3(a) The 
dynamical influence model discovers the sudden change of influence weights accurately at t = 200. 
Since the toy process only has two true configuration matrices, the posterior probability of the 3rd 
configuration being active is almost zero for any t. The system properties are fully captured by the 
other two configuration matrices during the training. The learned configuration matrices (shown 



in Table [TJ are correctly recovered. Based on Fig. 3(a) and experiments with other values for J 



(which we cannot show here due to the space limitation), we suggest that readers should gradually 
increase J until the newly added configuration matrices are no longer useful in capturing additional 
dynamical information from the data, by ensuring there is no constant zero posterior probability as 



in the right plot in Fig. 3(a) 



We also demonstrate convergence of the K-L Divergence between the true distributions of the 
transition probability and the learned distributions in Fig. 3(b) with different values of p . As can 



be seen in Fig. |3(b)[ the algorithm converges quickly within 50 iterations. However, when p v is 



small, we may encounter over-fitting where the learned model rapidly switches between different 
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(a) 




(b) 

Figure 3: (a): The posterior of r t is shown with J = 3 after convergence. The middle black vertical 
line on the left indicates the true switch in r t . The probability of R(l) being active and R(2) being 
active are shown in the left plot; R(3) is shown in the right, which remains inactive, (b): The 
K-L divergence between learned parameters and the true distributions with respect to number of 
iterations. 



configurations to best suit the data. Therefore, in Fig. 3(b) the divergence for p v = remains 
higher than other p v values at convergence. In conclusion, we advise users to increase p v gradually 
until the posterior of r t does not fluctuate. 



6.2 Modeling Dynamical Influence in Group Discussions 
6.2.1 Dataset Description and Preprocessing 

Researchers in [5] recruited 40 groups with four subjects in each group for this experiment. During 
the experiment, each subject was required to wear a sociometric badge on their necks for audio 
recording, illustrated in the right picture in Fig. [TJ and each group was required to perform two 
different group discussion tasks: a brainstorming task (referred as BS) and a problem solving task 
(referred as PS). Each task usually lasted for 3 to 10 minutes. We refer readers to the original 
paper[5] for details on data collection and experiment preparations. 

The groups were asked to perform these tasks in two different settings: one in which people 
were co-located in the same room around a table (referred as CO), and one in which two pairs of 
people were placed in two separate rooms with only audio communication available between them 
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(referred as DS). The badges are deployed in both cases for audio collecting. We separated all 
samples according to their tasks (BS/PS) and their settings (CO/DS), and we ended up with four 
categories: DS+BS, DS+PS, CO+BS, CO+PS. Since discussions were held in four-person groups, 
each sample for a discussion session is composed of four sequences collected by the four badges on 
participants' chests. The audio sequence picked up by each badge was split into one-second blocks. 
Variances of speech energy were calculated for each block. We then applied a hard threshold to 
convert them into binary sequences. In all experiments, we only used binary sequences as data 
input. 

6.2.2 Predicting Turn Taking in Discussion 

We here explain an application of the dynamical influence model to predict turn taking, and we 
show that it is possible to achieve good accuracy in prediction given only the audio volume variance 
observations, with no information from the audio content. 

Ten occurrences of turn taking behavior from each sample are selected for prediction purposes. 
"Turn taking" here is defined as incidences in which the current speaker ceases speaking, and an- 
other speaker starts to speak. 

For the dynamical influence model, we model each person as an entity c, and the observed 

(c) 

audio variances at time t as O t . Each person also has two hidden states, representing speaking 
or not speaking. The hidden layer eliminates error due to noise and non-voicing speaking in audio 
signals H6l. Therefore, influence here is set to capture how each person's speaking/non-speaking 
hidden states dynamically change other people's speaking/non-speaking states (i.e., how people 
influence each others' turn taking). All parameters are initialized randomly and learned by the E-M 
inference algorithm in this example. We train the dynamical influence model using data up to t — 1, 
sample observations at time t from it, and mark the chain that changes the most toward the high- 
variance observations as the turn taker at t. The emission probability Prob(Oj C ^|/i| c ^) is modeled 
using a multinomial distribution, and is estimated automatically during the E-M process. 

For comparison, we also show results using TESLA and nearest neighbors methods. For TESLA, 
we use the official implementation J23l to obtain the changing weights between pairs of nodes, and 
we pick the node that has the strongest correlation weight to other nodes at t — 1 as the turn taker at 
t. To predict the turn taking at time t using the nearest neighbor method, we look over all previous 
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instances of turn taking behaviors that have the same speaker as the one in t — 1, and predict by 
using the most frequent outcomes. 



Table 2: Accuracy for different turn taking prediction methods on both the full dataset and the half 
of the dataset with more complex interactions. The random guess accuracy is 33%. Human accuracy 
is typically around 50% for similar tasks[28|. 





ACCURACY 






ACCURACY 








ALL SAMPLES 






COMPLEX INTERACTION SAMPLES 


METHODS 


DS+BS 


DS+PS 


CO+BS 


CO+PS 


DS+BS 


DS+PS 


CO+BS 


CO+PS 


TESLA 


0.41 


0.42 


0.32 


0.25 


0.44 


0.37 


0.37 


0.17 


NN 


0.58 


0.60 


0.48 


0.50 


0.47 


0.47 


0.38 


0.26 


Ours(J=l) 


0.45 


0.67 


0.75 


0.63 


0.45 


0.56 


0.77 


0.62 


Ours(J=2) 


0.46 


0.58 


0.65 


0.34 


0.47 


0.58 


0.67 


0.46 


Ours(J=3) 


0.50 


0.60 


0.55 


0.48 


0.47 


0.73 


0.65 


0.65 



The accuracy for each algorithm is listed in Table [2] We also show the prediction accuracy for 
the half of all samples that have more complex interactions, i.e., higher entropy. For the dynamical 
influence approach, we list error rates for J = 1 (which is simply the influence model), J = 2 
and J = 3. Except DS+BS, We notice that the dynamical influence model outperforms others in 
all categories with different J. This performance is quite good considering that we are using only 
volume and that a human can only predict at around 50% accuracy for similar tasks ll28ll . 

More importantly, the dynamical influence model seems to perform much better than the com- 
peting methods for more complex interactions. For simple interactions, it seems that J = 1 or even 
NN perform the best due to the fact that there is little shift in the influence structure during the 
discussion. However, when handling complex interaction processes, the introduction of a switching 
influence dynamic dramatically improves the performance as shown in Table [2] This result suggests 
that the dynamical influence assumption is reasonable and necessary in modeling complex group 
dynamics, and it can improve prediction accuracy significantly. However, in simple cases, the model 
achieves the highest performance when J = 1 (i.e. the influence matrix is static), and a higher J 
will only lead to overfitting. 

The fact that turn taking is predictable using our dynamical influence assumption is indeed sur- 
prising, because group turn taking dynamics are complicated and related to content as well ||4 ]. We 
think that the dynamical influence model tracks the two main mechanisms of group discussions 
noted in Gibson Q even though the model does not incorporate the content of speaker statements. 
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First, the time variant assumption in the dynamical influence model captures the latency factor in 
group dynamics. And second, our abstract conditional probability definition of influence is essen- 
tially a generalization of the conversational obligation mechanism Gibson described. 

6.3 Modeling Flu Epidemics as Influence Dynamics 

The last example is an application of the dynamical influence model to weekly US flu activity data 
from Google Flu Trends |[29l . All 50 states in the U.S. are divided into ten regions by their geo- 
location, as shown in Fig. [4j and we model each region as an entity in the dynamical influence 
model. 




Figure 4: Ten regions of the United States defined by US Health and Human Services. 

As the data are continuous, six hidden states are used for each chain, and p(0^ \ h^f 1 ) is modeled 
with six different Gaussian distributions with different means and the same variance for each hidden 
state. We set by hand the six mean values so that they represent the six different severe levels for the 
flu epidemics, from the least severe to the most severe. We train the model using the first 290 weeks 
(from 2003 to early 2009), and we show the posterior for r t , the switching parameter, in Figj5] 
together with the three learned influence matrices. While there are many small peaks suggesting 
changes in influence, the probability changes dramatically around Christmas, which suggests that 
the influence patterns among these ten regions are very different during the holiday season. Note 
especially that we did not tell the model to search for different patterns on those days - instead, it 
reveals the fact that transmission dynamics operate differently during a time when many people are 
engaging in once-a-year travel to visit family and friends. While it is possible that other mechanisms 
are at work, alternative explanations like the weather are not plausible because they do not change 
discontinuously during the holiday season. 

Influence matrix 1 captures the dynamics during holiday seasons, while influence matrix 2 cap- 
tures the dynamics during normal seasons. Row i corresponds to the region i in Fig. |4j Let's take 
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an example of Row 1, the New England region. During normal times as shown in the 1st row of 
influence matrix 2, New England is more likely to be influenced by close regions such as 3 and 
4; during holiday seasons, New England is more likely to be influenced by all regions especially 
distant regions such as region 9. The same phenomena exist for other regions as well. 
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Figure 5: The inferred posterior for r t given all observations after convergence is shown here. While 
there are many small peaks indicating changes in influence, the largest peaks occur at Christmas 
holiday seasons, which implies holiday traffic patterns can have a big effect on flu transmissibility 
between regions. We find that three configuration matrices are good enough to capture the flu 
dynamics. 



7 Discussion 

In this article we described the influence model and its generalization, the dynamical influence 
model, and we showed how these can be applied to a variety of social signals to infer how entities 
in networks affect one other. In particular, we can use the resulting influence matrix R to con- 
nect the underlying network and the stochastic process of state transition. The switching matrices 
R(l), R( J) are even able to bridge the state transition to time-varying networks. 

The influence model shares the same issues with other machine learning models: inference 
requires sufficient training data, and tuning is necessary for best results. Future work includes 
combining known network data into the model to boost performance. 

The most important limitation is that we are attempting to infer causal processes from obser- 
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vational data in which many mechanisms are likely at play. If we find that behavior between two 
individuals is correlated, it could be due to influence, but it could also be due to selection (I choose 
to interact with people like me) or to contextual factors (you and I are both influenced by an event or 
a third party not in the data). It has been recently shown that these mechanisms are generically con- 
founded [11 J but it is important to remember that this does not make observational data worthless. 
It just means that we should carefully consider alternative mechanisms that may underlie correlated 
behavior. The fact that we have time data to test causal ordering and we have asymmetries in net- 
work relationships to test direction of effects means that we can have greater (but not complete!) 
confidence than we would if we only had cross-sectional data from symmetric relationships. 
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8 Appendix A: Model Learning 

We here show detail steps for our variational E-M algorithm. Definition is denoted by =, and ~ 
denotes the same distribution but the right side should be normalized accordingly. 

8.1 E-Step 

We adopt a procedure similar to the forward-backward procedure in HMM literature. We compute 
the following forward parameters for t = 1, T.: 



where 0\-t denotes {0[f Yt~2{ '"' t • However, exact inference is not intractable. We apply the 
variational approach in ||30ll 1271 . The variational E-M process is still guaranteed to converge 
because of the lower bound property of the variational method [27 ]. We decouple the chains by: 



c^ c = Prob(/4 c) |ri,Oi :t ) 



(15) 



Kt = Prob(n|O w ) 



(16) 



Prob(/iJ 



'i:ur t )^l[Q(hi c) \0 1 .. t ,r t ) 



(17) 



c 



and naturally: 




(18) 



We define: 




(19) 



and 




(20) 



where: 



Prob(rt_i|Oi :t _i,r t ). 



(21) 
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We define Q(h^) to be in the form of — Ploh ( h t \Oi-.t i,n)s t — which captures both the evidence 

E„(c) Prob^ |Oi :t _i 

from previous states(Prob(/i^ \0\-t-i, r t )) and the evidence(s^) from observations. We then have: 



Q(^ 1 '-' C) |0 1:t ,r t ) 



x*< c > 



-n — 7 - 



where Vl/( c ) is actually Prob(/jj c ^|Oi:i-i, n). We also have 



Prob^'-'^^tlOi:*-!,^) = 



II E^S) h (0« rt *-i,c+ E E R ^ F S),.(c)« rt *-^ Prob(oJ c VJ c) ), (23) 



and 

Prob(/,!' C V.,- 1 ,n) = ft0b ' ft ' 1 h ,; C l'°' |0 '''- 1 ''-' ) . (24) 

Prob(O t |Oi :t _i,r t ) 

Cl CI (1 C) 

We continue to minimize the K-L divergence between Prob(/iJ , -^ ; |O w ,r t ) and |Oi :t) 

that is: 



arg mm 

„( c ) 



= E Q (log Q^ 1 '-'^ |Oi :t , r t )) -Eq (Prob(/ it (1 -- C) |Oi :t ,r t )) 

E E 1o § vI/(c) + E lo s *J C) - E ME * (C M C) ) 
v c c c 

-Eg ^E lo S^ (c) +E 1 °g Prob (° (t) l /l * C) )) + Prob(O t |Oi:t-i,n) . (25) 



(c) 

unrelated to si 7 



22 



By taking the derivative we have: 

H = Ef^(^-Prob (O( «|^))=0 

'H 

^s { t c) =Prob(O t (c) |/ l J c) ) (26) 

We then compute K t using Bayes' rule: 

Kt ~ Prob(0 4 |r 4 , O l!t _i)Ptob(ri|Oi rt _i). (27) 

where Prob(Ot|rt, Ox-.t-l) can be evaluated using the previous approximation results. The prior 
part of Eq. [27] can be evaluated using V and Kt-i- 

Using the same idea, we can compute the following backward parameters for all t: 

/3[ t c = Prob(/ i ; c) |rt,O i:r ), (28) 

u t = Prob(rt|O t:T ), (29) 

&, c = Prob(^ c) |O t:T ) = "tl% e . (30) 

n 

8.2 M-step 

With Kt and u t , we can estimate: 



€ij = Prob(r t = i,r t+l = j\Oi :T ) = 

Prob(r t = i|Oi :t )Prob(r f+ i = j\O t+ i :T )Prob(r t+1 \r t ) 
Pmb(r t = i|Oi : t)Prob(r t+ i = i|O t+1:T )Prob(r i+ i|rt) ' 



(31) 



and 



We then update V by: 



A* = Prob(r f = i\0 1:T ) = v f ^ f • (32) 



<- ^ g - v , (33) 
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where k = p v if i = j, otherwise. 

We compute the following joint distribution. 



Prob(hf\hi%,q$ v r t+1 \0 1 , T ) 



I.jt(c) 



- < , , x a rrt t,c/3[; iiC A'Prob(g i ( + ) 1 |r t+ i), 

n t > n t+i 



c, 



z -- : x ^i lC A*Prob(gg 1 |n + ^ 



h 9t+1 /j (t:) 



ifgg^c. 



(34) 



Z denotes the normalization factor that can be calculated easily by summing all possible values 
for each variable. This is fast to compute since the joint distribution is made of only four variables. 



By marginalizing Eq. 34 we can update parameters R, E and F: 



Ri ( E f Probfa (£l) =c 2 ,r t = 
C1 ' C2 E t E c Probfe (ci) =c,r 4 
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9 Appendix B: Detecting Structural Changes in the Discussion Dy- 
namics 



We here provide an additional example for detecting structural changes using the same dataset 
described in Section 5 of our paper. 

One important feature of this model is its ability to capture changes in influence dynamics given 
only observed time series for each node. In this section, we will demonstrate the performance of 
our model in detecting changes with the group discussion dataset. 

In our discussion, a sample refers to the set of four sequences collected by the four badges 
in deployed in a group discussion session. We adopt the following evaluation procedure: One 
mixed binary audio sample for each four-person group is generated by concatenating the co-located 
discussion session sample and the the distributed discussion session sample of the same group. It 
is known that [5] the interaction pattern in a distributed discussion session is often different from 
a co-located discussion session. Therefore, we are able to create ground truth about changes of 
influence patterns by switching from a distributed discussion sample to a co-located discussion 
sample manually. It should be noted that we only use binary sequences by thresholding the volume 
variance. Thus, we have eliminated all information in the audio content. Two samples from each 
group are included in our final evaluation set: a) the original sample of the co-located discussion 
session (CO) and b) the mixed sample as described above (CO+DS). We end up with a total of 
28 groups and 56 samples in the final set. Lengths of each sample vary from 100 seconds to 500 
seconds. 

We apply our model on both samples for each group. The emission probability in our model is 
used to tolerate possible error due to hard thresholding and possible noise. We choose J = 2, and 
p v is optimized for best performance. The posterior of r* for the two samples from each group is 
stored as the output of the algorithm. 

We continue to develop simple heuristics for distinguishing DS+CO from CO by looking at the 
difference of the expected influence matrix ■ X^Rj) at t = 1 and t = 0.8T for each sample, and 
the one with larger difference is labeled as the CO+DS sequence. Given the pair of samples for each 
group, we test the labeling accuracy based on the output of our model. For comparison, we also 
implement two other techniques: a) classification based on one single feature, the turn taking rate, 
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Figure 6: The accuracy rates for classifying CO+DS samples from CO samples are shown above. 
Our algorithm performs significantly better than the other two methods, which are based on simple 
statistical features. 

and b) S.V.M. -based classification (using implementation in ED). It is well recognized that the turn 
taking rate is an important indicator for group dynamics. We compute the two turn taking rates for 
each pair of samples and compare them to determine sample labels. For S.V.M., we compute the 
turn taking rate and the speaking durations for each group member as the feature vector for each 
sample. Its performance is obtained via a four-fold cross validation. It should be emphasized that 
the S.V.M. classification task is different from the other two, and it is naturally more challenging: 
all samples are mixed together before fed to S.V.M. rather than being fed to other two algorithms in 
a pairwise manner. 

We must point out that the ground truth in our evaluation may not be accurate: There is no 
guarantee in the dataset that a group of people behave and interact with each other differently when 
they are performing discussions using remote communication tools rather than being in the same 
room. 

We illustrate the accuracy rates in Fig. [6j As we expected, our algorithm reaches 71% accuracy 
and outperforms the other two methods. We argue that the influence dynamic is an intrinsic property 
of the group, which can not be fully revealed using simple statistical analysis on observable features. 
To investigate and visualize the dynamical characteristics of human interaction patterns, a more 
sophisticated model, such as our dynamical influence process, must be deployed to reveal the subtle 
differences in influence dynamics. 

In addition, we claim that our model is capable of modeling, quantifying and tracking occur- 
rences of such shifts in face-to-face dynamics accurately. Our model fits its parameters to best suit 
switches between different influence patterns, and the parameters will be helpful for sociologists to 
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objectively investigate the micro relationship in a group discussion session. Information discovered 
by our algorithm will also be useful in applications such as understanding possible interventions in 
human interactions [ 5 ] [ 32 ] . 
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